61 research outputs found

    Direct Inter-Process Communication (dIPC): Repurposing the CODOMs architecture to accelerate IPC

    Get PDF
    In current architectures, page tables are the fundamental mechanism that allows contemporary OSs to isolate user processes, binding each thread to a specific page table. A thread cannot therefore directly call another process's function or access its data; instead, the OS kernel provides data communication primitives and mediates process synchronization through inter-process communication (IPC) channels, which impede system performance. Alternatively, the recently proposed CODOMs architecture provides memory protection across software modules. Threads can cross module protection boundaries inside the same process using simple procedure calls, while preserving memory isolation. We present dIPC (for "direct IPC"), an OS extension that repurposes and extends the CODOMs architecture to allow threads to cross process boundaries. It maps processes into a shared address space, and eliminates the OS kernel from the critical path of inter-process communication. dIPC is 64.12× faster than local remote procedure calls (RPCs), and 8.87× faster than IPC in the L4 microkernel. We show that applying dIPC to a multi-tier OLTP web server improves performance by up to 5.12× (2.13× on average), and reaches over 94% of the ideal system efficiency.We thank Diego Marr®on for helping with MariaDB, the anonymous reviewers for their feedback and, especially, Andrew Baumann for helping us improve the paper. This research was partially funded by HiPEAC through a collaboration grant for Lluís Vilanova (agreement number 687698 for the EU’s Horizon2020 research and innovation programme), the Israel Science Fundation (ISF grant 769/12) and the Israeli Ministry of Science, Technology and Space.Peer ReviewedPostprint (author's final draft

    System noise, OS clock ticks, and fine-grained parallel applications

    Full text link
    As parallel jobs get bigger in size and finer in granularity, “system noise ” is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP nodes run faster if a processor is intentionally left idle (per node), thus enabling a separation of “system noise ” from the com-putation. Paying a cost in average processing speed at a node for the sake of eliminating occasional processes delays is (unfortunately) beneficial, as such delays are enormously magnified when one late process holds up thousands of peers with which it synchronizes. We provide a probabilistic argument showing that, under certain conditions, the effect of such noise is linearly pro-portional to the size of the cluster (as is often empirically observed). We then identify a major source of noise to be indirect overhead of periodic OS clock interrupts (“ticks”), that are used by all general-purpose OSs as a means of main-taining control. This is shown for various grain sizes, plat-forms, tick frequencies, and OSs. To eliminate such noise, we suggest replacing ticks with an alternative mechanism we call “smart timers”. This turns out to also be in line with needs of desktop and mobile computing, increasing the chances of the suggested change to be accepted. 1

    Analysis of the Task Superscalar architecture hardware design

    Get PDF
    In this paper, we analyze the operational flow of two hardware implementations of the Task Superscalar architecture. The Task Superscalar is an experimental task based dataflow scheduler that dynamically detects inter-task data dependencies, identifies task-level parallelism, and executes tasks in the out-of-order manner. In this paper, we present a base implementation of the Task Superscalar architecture, as well as a new design with improved performance. We study the behavior of processing some dependent and non-dependent tasks with both base and improved hardware designs and present the simulation results compared with the results of the runtime implementation.This work is supported by the Ministry of Science and Technology of Spain and the European Union (FEDER funds) under contract TIN2007-60625, by the Generalitat de Catalunya (contract 2009-SGR-980), and by the European FP7 project TERAFLUX id. 249013, http://www.tera ux.eu. We would also like to thank the Xilinx University Program for its hardware and software donations.Postprint (author’s final draft

    Contents

    Get PDF
    Today's complex software systems are neither secure nor reliable. The rudimentary software protection primitives provided by current hardware forces systems to run many distrusting software components (e.g., procedures, libraries, plugins, modules) in the same protection domain, or otherwise suffer degraded performance from address space switches. We present CODOMs (COde-centric memory DOMains), a novel architecture that can provide finer-grained isolation between software components with effectively zero run-time overhead, all at a fraction of the complexity of other approaches. An implementation of CODOMs in a cycle-accurate full-system x86 simulator demonstrates that with the right hardware support, finer-grained protection and run-time performance can peacefully coexist.We would like to thank Lluc Alvarez, Javier Cabezas, Ana Jokanovic, Marc Jorda, Carlos Villavieja, our shepherd Mohit Tiwari and the anonymous reviewers for their help and comments on this paper. This work has received funding from: the European Commission through TERAFLUX (FP7-249013) and RoMoL (GA-321253); the Spanish Government through Programa Severo Ochoa (SEV-2011-0067); the Spanish Min­istry of Science and Technology through TIN2007-60625 and TIN2012-34557; the Israel Science Foundation (grant 769/12 and equipment grant 1719112); and the Ministry of Science and Technology, Israel. Yoav Etsion was supported by the Center for Computer Engineering at the Technion.Peer ReviewedPostprint (published version

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft
    • 

    corecore